Univariate Plots Section

## [1] 1599   12
## tibble [1,599 x 12] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ fixed.acidity       : num [1:1599] 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
##  $ volatile.acidity    : num [1:1599] 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
##  $ citric.acid         : num [1:1599] 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
##  $ residual.sugar      : num [1:1599] 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
##  $ chlorides           : num [1:1599] 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
##  $ free.sulfur.dioxide : num [1:1599] 11 25 15 17 11 13 15 15 9 17 ...
##  $ total.sulfur.dioxide: num [1:1599] 34 67 54 60 34 40 59 21 18 102 ...
##  $ density             : num [1:1599] 0.998 0.997 0.997 0.998 0.998 ...
##  $ pH                  : num [1:1599] 3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
##  $ sulphates           : num [1:1599] 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
##  $ alcohol             : num [1:1599] 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
##  $ quality             : Ord.factor w/ 6 levels "3"<"4"<"5"<"6"<..: 3 3 3 4 3 3 3 5 5 3 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   fixed.acidity = col_double(),
##   ..   volatile.acidity = col_double(),
##   ..   citric.acid = col_double(),
##   ..   residual.sugar = col_double(),
##   ..   chlorides = col_double(),
##   ..   free.sulfur.dioxide = col_double(),
##   ..   total.sulfur.dioxide = col_double(),
##   ..   density = col_double(),
##   ..   pH = col_double(),
##   ..   sulphates = col_double(),
##   ..   alcohol = col_double(),
##   ..   quality = col_double()
##   .. )
##  fixed.acidity   volatile.acidity  citric.acid    residual.sugar  
##  Min.   : 4.60   Min.   :0.1200   Min.   :0.000   Min.   : 0.900  
##  1st Qu.: 7.10   1st Qu.:0.3900   1st Qu.:0.090   1st Qu.: 1.900  
##  Median : 7.90   Median :0.5200   Median :0.260   Median : 2.200  
##  Mean   : 8.32   Mean   :0.5278   Mean   :0.271   Mean   : 2.539  
##  3rd Qu.: 9.20   3rd Qu.:0.6400   3rd Qu.:0.420   3rd Qu.: 2.600  
##  Max.   :15.90   Max.   :1.5800   Max.   :1.000   Max.   :15.500  
##    chlorides       free.sulfur.dioxide total.sulfur.dioxide    density      
##  Min.   :0.01200   Min.   : 1.00       Min.   :  6.00       Min.   :0.9901  
##  1st Qu.:0.07000   1st Qu.: 7.00       1st Qu.: 22.00       1st Qu.:0.9956  
##  Median :0.07900   Median :14.00       Median : 38.00       Median :0.9968  
##  Mean   :0.08747   Mean   :15.87       Mean   : 46.47       Mean   :0.9967  
##  3rd Qu.:0.09000   3rd Qu.:21.00       3rd Qu.: 62.00       3rd Qu.:0.9978  
##  Max.   :0.61100   Max.   :72.00       Max.   :289.00       Max.   :1.0037  
##        pH          sulphates         alcohol      quality
##  Min.   :2.740   Min.   :0.3300   Min.   : 8.40   3: 10  
##  1st Qu.:3.210   1st Qu.:0.5500   1st Qu.: 9.50   4: 53  
##  Median :3.310   Median :0.6200   Median :10.20   5:681  
##  Mean   :3.311   Mean   :0.6581   Mean   :10.42   6:638  
##  3rd Qu.:3.400   3rd Qu.:0.7300   3rd Qu.:11.10   7:199  
##  Max.   :4.010   Max.   :2.0000   Max.   :14.90   8: 18

Our dataset has 12 variables with 1599 observations.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.40    9.50   10.20   10.42   11.10   14.90

The graph shows the distribution of alcohol percentage in wines. Most wines have an alcohol percentage of between 9 to 11%.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.740   3.210   3.310   3.311   3.400   4.010

The graph shows the distribution of pH in wines. Most wines have a pH of between 3.0 to 3.5.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.900   1.900   2.200   2.539   2.600  15.500

The graph shows the distribution of residual sugar in wines. The highest residual sugar in the wines is 15.5 g/dm^3.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.01200 0.07000 0.07900 0.08747 0.09000 0.61100

The graph shows the distribution of chlorides in wines. Most wines have a low amount of salt.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1200  0.3900  0.5200  0.5278  0.6400  1.5800

The graph shows the distribution of volatile acidity in wines. Most wines have a low volatile acidity below 1.2.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.9901  0.9956  0.9968  0.9967  0.9978  1.0037

The graph shows the distribution of density in wines. The wines data seem to have a normal distribution of alcohol density .

##   3   4   5   6   7   8 
##  10  53 681 638 199  18

The graph shows the distribution of quality in the wines. Most wines had a quality of 5 and 6.

Univariate Analysis

What is the structure of your dataset?

There are 1599 observations in our dataset with 12 variables (fixed acidity, volatile acidity, citric acidity, residual sugar, chlorides, free sulphur dioxide, total sulphur dioxide, density, pH, sulphates, alcohol, quality). The categorical variable is quality.

Other observations:

  • Majority of wines have a density of between 0.995 and 1.000.

  • Most wines have a volatile acidity of less than 0.8 g/dm^3 hence they have pleasant taste.

  • Most wines have a small amount of salt less than 0.2 g/dm^3.

  • The mean alcohol percentage of the wines is 10%.

What is the main feature of interest in your dataset?

The main feature of interest in this dataset is quality which is the quality level of the wine. I want to know which features of wine influence quality and can be used to predict the quality level of a wine.

What other features in the dataset do you think will help support your investigation into your feature(s) of interest?

Alcohol, pH, residual sugar, chlorides, and fixed acidity are features that may influence fuel consumption of a car.

Bivariate Plots Section

##                      fixed.acidity volatile.acidity citric.acid residual.sugar
## fixed.acidity                 1.00            -0.26        0.67           0.11
## volatile.acidity             -0.26             1.00       -0.55           0.00
## citric.acid                   0.67            -0.55        1.00           0.14
## residual.sugar                0.11             0.00        0.14           1.00
## chlorides                     0.09             0.06        0.20           0.06
## free.sulfur.dioxide          -0.15            -0.01       -0.06           0.19
## total.sulfur.dioxide         -0.11             0.08        0.04           0.20
## density                       0.67             0.02        0.36           0.36
## pH                           -0.68             0.23       -0.54          -0.09
## sulphates                     0.18            -0.26        0.31           0.01
## alcohol                      -0.06            -0.20        0.11           0.04
##                      chlorides free.sulfur.dioxide total.sulfur.dioxide density
## fixed.acidity             0.09               -0.15                -0.11    0.67
## volatile.acidity          0.06               -0.01                 0.08    0.02
## citric.acid               0.20               -0.06                 0.04    0.36
## residual.sugar            0.06                0.19                 0.20    0.36
## chlorides                 1.00                0.01                 0.05    0.20
## free.sulfur.dioxide       0.01                1.00                 0.67   -0.02
## total.sulfur.dioxide      0.05                0.67                 1.00    0.07
## density                   0.20               -0.02                 0.07    1.00
## pH                       -0.27                0.07                -0.07   -0.34
## sulphates                 0.37                0.05                 0.04    0.15
## alcohol                  -0.22               -0.07                -0.21   -0.50
##                         pH sulphates alcohol
## fixed.acidity        -0.68      0.18   -0.06
## volatile.acidity      0.23     -0.26   -0.20
## citric.acid          -0.54      0.31    0.11
## residual.sugar       -0.09      0.01    0.04
## chlorides            -0.27      0.37   -0.22
## free.sulfur.dioxide   0.07      0.05   -0.07
## total.sulfur.dioxide -0.07      0.04   -0.21
## density              -0.34      0.15   -0.50
## pH                    1.00     -0.20    0.21
## sulphates            -0.20      1.00    0.09
## alcohol               0.21      0.09    1.00

The plot shows the correlation plots of the variables.

## [1] 0.6717034

The scatter is plotted to show the relationship between fixed acidity and citric acid. There is a positive relationship between the two variables.

## [1] 0.6680473

The scatter is plotted to show the relationship between fixed acidity and density. There is a positive relationship between the two variables.

## [1] -0.6829782

The scatter is plotted to show the relationship between fixed acidity and pH. There is a negative relationship between the variables.

## [1] -0.5524957

The scatter is plotted to show the relationship between volatile acidity and citric acid. There is a negative relationship between the variables.

## $`3`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   8.400   9.725   9.925   9.955  10.575  11.000 
## 
## $`4`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    9.00    9.60   10.00   10.27   11.00   13.10 
## 
## $`5`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     8.5     9.4     9.7     9.9    10.2    14.9 
## 
## $`6`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.40    9.80   10.50   10.63   11.30   14.00 
## 
## $`7`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    9.20   10.80   11.50   11.47   12.10   14.00 
## 
## $`8`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    9.80   11.32   12.15   12.09   12.88   14.00

The boxplots is plotted to show the distribution of alcohol against quality. wines with quality level of 5 has outliers were the most wines in this level have low alcohol percentage but some of the wines show a higher alcohol percentage.

## $`3`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.160   3.312   3.390   3.398   3.495   3.630 
## 
## $`4`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.740   3.300   3.370   3.382   3.500   3.900 
## 
## $`5`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.880   3.200   3.300   3.305   3.400   3.740 
## 
## $`6`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.860   3.220   3.320   3.318   3.410   4.010 
## 
## $`7`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.920   3.200   3.280   3.291   3.380   3.780 
## 
## $`8`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.880   3.163   3.230   3.267   3.350   3.720

The boxplots is plotted to show the distribution of pH against quality. wines with quality level of 3 have high pH while wines with quality level of 8 have the lowest pH level. Wines with quality have outliers in both ends meaning some of the wine have high and low pH than the normal range.

## $`3`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.200   1.875   2.100   2.635   3.100   5.700 
## 
## $`4`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.300   1.900   2.100   2.694   2.800  12.900 
## 
## $`5`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.200   1.900   2.200   2.529   2.600  15.500 
## 
## $`6`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.900   1.900   2.200   2.477   2.500  15.400 
## 
## $`7`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.200   2.000   2.300   2.721   2.750   8.900 
## 
## $`8`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.400   1.800   2.100   2.578   2.600   6.400

The boxplots is plotted to show the distribution of residual sugar against quality. wines with quality level of 3, 4, and 8 have the lowest median of residual sugar. WInes with quality level of 5 and 6 have alot of outliers meaning they had more sugar than the normal range in those quality levels.

## $`3`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4400  0.6475  0.8450  0.8845  1.0100  1.5800 
## 
## $`4`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.230   0.530   0.670   0.694   0.870   1.130 
## 
## $`5`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.180   0.460   0.580   0.577   0.670   1.330 
## 
## $`6`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1600  0.3800  0.4900  0.4975  0.6000  1.0400 
## 
## $`7`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1200  0.3000  0.3700  0.4039  0.4850  0.9150 
## 
## $`8`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.2600  0.3350  0.3700  0.4233  0.4725  0.8500

The boxplots is plotted to show the distribution of volatile acidity against quality. wines with quality level of 3 have a high level of volatile acidity while wine with quality level of 8 have a low level of volatile acidity. WInes with quality level of 5 and 6 have alot of outliers meaning they had more volatile acidity than the normal range in those quality levels.

Bivariate Analysis

Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

The negative relationship between pH and citric was interesting. Another interesting relationship was the negative relationship between citric acid and volatile acidity.

What were the strongest relationships you found?

There was a strong relationships between fixed acidity and citric acid, fixed acidity and density, and free sulphur dioxide and total sulphur dioxide.

Multivariate Plots Section

This chart shows distribution of alcohol by quality of the wines.

This chart shows distribution of pH by quality of the wines.

This chart shows distribution of residual sugar by quality of the wines.

This chart shows the relationship between pH levels and Citric acid/residual sugar.

Multivariate Analysis

Talk about some of the relationships you observed in this part of the investigation.

Wines with a moderate pH have a low citric acid/pH rate. Wines with quality level have a high alcohol level compared to the rest. Wines with quality level of 3 and 8 have a high residual sugar.

Final Plots and Summary

Plot One

Description One

The distribution of residual sugar of the wines seem to skew to the right. Most wines have a residual sugar of below 4 g/dm^3.

Plot Two

Description Two

Wines with quality level of 5 had a low median alcohol percentage. But some wines with quality level of 5 seemed to have high alcohol percentage hence appearing as outliers. Wines with quality level of 8 have a high median alcohol percentage.

Plot Three

Description Three

The scatter plot shows a negative relationship. The citric acid/residual sugar has a negative correlation with pH levels of the wines. Wines with low pH tend to have a high citric acid/residual sugar. As the pH levels increase the citric acid/residual sugar decreases.

Reflection

The wineQualityReds dataset contained information of about 1599 wines. The dataset comprises of information about wines and their features. I did some exploration to understand the variables in the dataset. I explored relationship between the quality of wines with other variables.

There was a good relationship between the fixed acidity and citric acid. A moderate relationship was also observed between fixed acidity and density The surprising thing was that pH and citric acid had a negative correlation despite them being closely related.

The limitation of this dataset is it had red wines observations only. The dataset did not also include other types of wines wich as white wine. Give the dataset contains data from 2009 the analysis of wines would not reflect on the wines produced from 2010 to 2021.